Wildfires in the US

Statistical computation and visualization (MATH-517)

Zineb AGNAOU https://github.com/ZinebAg , Fahim BECK https://github.com/FahimBeck , Salima JAOUA https://github.com/salimajaoua , Matias JANVIN https://github.com/matiasjanvin , Seorim PARK https://github.com/seorimpark
November 19, 2021

Introduction

Wildfires are uncontrolled fires that burn in the wildland vegetation, often in rural areas. They are not limited to a particular continent or environment, and burned different kinds of ecosystems for hundreds of millions of years on Earth (André Gabrielli 2019). The problem of wildfires is at the stake all over the world, along with the topic of climate change and preservation of nature and ecosystems. There has been several major cases of wildfires recently, growing in number and severity: for instance, the California wildfires in 2020 became one of the largest wildfire season in the California history (Holly Yan, Cheri Mossburg, Artemis Moshtaghian and Paul Vercammen 2020) with several millions of acres burnt (Topher Gauk-Roger, Stella Chan, Jason Hanna and Steve Almasy 2020). Also, Turkey went through the worst wildfire season of the country in July and August 2021 (Mert Ozkan and Ezgi Erkoyun 2021), and the 2019-2020 bushfires in Australia (also known as the Black Summer) killed several billions of animals. A lot of them were endangered species, which some were believed to be driven to extinction from this incidence (Michael Slezak 2020). Therefore, a lot of countries aim at minimizing the size and the number of occurrences of wildfires, since they can be the cause of many direct and indirect fatalities in humans (Steven Reinberg 2021), as well as air pollution (Sarah Gibbens 2021) and the loss of ecosystems and biodiversity (the case of Black Summer).

In order to reduce the number and the severity of wildfires, understanding the main factors at the origin of the catastrophy is necessary. The incidences are very often caused accidentally (burning debris, agricultural activities, campfires, smoking), or intentionally (arson, children). Although the latter case can be prevented, human or non-human accidents can always happen and they are hard to predict (Wildfire Causes, n.d.). However, we can suspect that there are certain natural conditions that make those accidents easier to happen and to grow them bigger in size. By identifying them, the states can build a strategy to efficiently suppress wildfire once it happens, and get prepared to fight against for locations that are highly possible to catch fire at a certain period of the year.

In addition, those factors change as the time goes by, and they generate a better or worse conditions for wildfires to happen. For instance, global warming is highly suspected to be one of the main reasons why the wildfires were more recurrent in the recent days (Alejandra Borunda 2021). Countries have been undergoing climate changes, and those unexpected events can be the seed of the recent disasters.

Research questions

The purpose of this investigation is to give an answer to the following question:

What are the main factors that affect the propagation of wildfires within the United States?

An investigation will be conducted and consists in answering these subquestions:

  1. How have the number of fires in the United States of America evolved with time from 1993 to 2015?

  2. How do the land covers vary over time?

  3. How are fires distributed across the land covers and meteorological factors?

Approaches

To answer the above questions we will proceed as follows. A descriptive and visual approach where interactive plots will be produced with information on different dimensions: geographically, temporally and by different factors. A descriptive analysis to study the evolution of wildfire cases, and its relation with the meteorological factors will be made. Afterwards, the distribution of land covers will be studied. Also, the analysis of the variation of land covers in the same location will be carried because of the changes that have been noticed in some areas. We will then proceed with the analysis of the number of fires. To do so, the analysis of the land covers and meteorological parameters will be conducted. After this analysis, we will use both characteristics to build a model which will predict number of fires in a grid and another one for predicting aggregate burnt areas in a grid. A part is devoted to the modeling of the effect of wind on the spread of wildfires. After noticing that the correlation between parameters was low, we used subsets and performed a quantile regression.

Sources of information / datasets

To perform this analysis, a dataset for the United States from 1993 to 2015 will be used. It contains 563,983 rows with 37 columns. The columns are the following:

Please note that the area proportions \(lc1\) to \(lc18\) do not always sum to exactly 1 for each pixel and month since a few classes with quasi-0 proportion have been removed.

Since the original data was given under the context of a prediction competition with the University of Edimburgh, there is a 8,000 of missing values in each of the \(CNT\) and \(BA\) columns. The missing values are not located necessarily in the same lines for the two features.

Exploratory Data Analysis

When considering only rows without missing values, 452,930 rows remain.

Table (1): Summary of features CNT, BA and the sum of lcs by row
Statistic Min Pctl(25) Median Pctl(75) Mean Max
CNT 0 0 0 2 2.280 359
BA 0 0 0 1.6 158.898 538,054
sum_of_lcs 0.822 0.997 0.999 1.000 0.997 1.000

As shown in the Table (1), wildfires remain relatively rare events. More than 75% of the locations considered have less than two fires per month when looking at the feature \(CNT\). Same applies for the feature \(BA\) representing aggregated burnt area, where the distribution is strongly positively skewed.

As stated in the data description, the proportions of the 18 land covers do not always add up to one. Looking at Figure 1 and in Table (1), we can see that the minimum value for the sum is 0.82. It is also seen from the 1st Quantile value that only 25% of the data has a sum below approx 0.99. We therefore continue with the data considering it is close enough.

Histogram representing the sum of land covers by row from 1993 to 2015. The histogram is negatively skewed

Figure 1: Histogram representing the sum of land covers by row from 1993 to 2015. The histogram is negatively skewed

Distribution of land covers from 1993 to 2015

Figure 2: Distribution of land covers from 1993 to 2015

In Figure 2 is displayed the distribution of land covers using Boxplot. We can see that most of the land cover represent less than 10% of the the location considered, this shows that the area considered are diverse.

Some transformations on the features were made: first the temperature was converted from Kelvin to Celsius. Next, the U-component of wind (the wind speed in Eastern direction) and V-component of wind (the wind speed in Northern direction) were aggregated using the euclidian norm of the vector: \[W\hspace{-2pt}speed=\sqrt{{W\hspace{-2pt}speed_{East}}^2 + {W\hspace{-2pt}speed_{North}}^2 }\] with \(W\hspace{-2pt}speed\) the wind speed.

Visualisation and wildfires over time

In order to do a descriptive analysis of the data and before exploring the different factors, we first plot on the map the number of cases of wildfires (denoted as \(CNT\)) as well as the burnt area (denoted as \(BA\)) from it with respect to time. This was made with the objective of determining which states are the most affected by wildfires and identifying the time when the fires happen the most.

The given dataset had a list of different coordinates in the United States. To determine which coordinates belong to which state, the python library \(\it{reverse\_geocoder}\) (link) was used. This library gives the closest address given the coordinates. With this, we proceed by extracting the name of the state, added up the numbers and stored in a dictionary for each state. \(CNT_i\) or \(BA_i\) stand for the values of the dictionary for i a state in the U.S.A . Let us also denote \(CNT_k\) or \(BA_k\) the value for k a given coordinate.

To better visualize the data, several adjustments have been made. First, the number of incidences and the burnt area were divided by the total area of the states to make a comparison. Then this value was multiplied by \(10^5\) (for \(CNT\)) or \(10^4\) (for \(BA\)) in order to get the number of wildfires/burnt area of the state per \(10^4km^2\) or \(10^5km^2\) respectively. Also, we realized that the obtained numbers could go from the order of \(10^{-2}\) to \(10^{3}\). In order to have a reasonable color scale for each state, the log scale was applied. The final numbers for the plots are calculated as follows:

\[Final\_CNT_i=log_2\left(\frac{10^5\left(\sum\limits_{k=coordinate\_0}^{total\_number\_of\_coordinates\_in\_i}CNT_k\right)}{total\_area\_of\_i}+1\right)\] \[Final\_BA_i=log_2\left(\frac{10^4\left(\sum\limits_{k=coordinate\_0}^{total\_number\_of\_coordinates\_in\_i}BA_k\right)}{total\_area\_of\_i}+1\right)\] for i a state in the U.S.A.

The numbers close to 0 are trivial, hence 1 is added before taking the log to avoid having meaningless outliers with the scale starting with big negative numbers.

In addition, we plotted the number of incidences/burnt area for each location in red scatter points with a size scale. To adjust the numbers, the log scale was again applied. The numbers were obtained as follows:

\[Local\_CNT_k=4log_2\left(CNT_k+1\right)\]

\[Local\_BA_k=2log_2\left(BA_k+1\right)\]

for k a given coordinate of the dataset.

Several interactive maps in python with the chosen scaling methods was made, but due to technical issues1, deploying the maps with an external link was not possible. However, running the interactive maps on the file \(\it Visualisation\_general.ipynb\) and \(\it Visualisation\_specific.ipynb\) on local servers is still possible, provided that the needed libraries are installed. Figures 3, 5, 6, 7, 8 are animated plots of the interactive map with respect to different time frame, and the mode (\(CNT\) or \(BA\))

Figure 3 is an overview of one option that can be chosen for the plot. The color scale shows the burnt area for each states, and the circle scatter plot shows the number of incidences for each coordinates.

Overview of the interactive map, BA in colors and CNT in scatter circles

Figure 3: Overview of the interactive map, BA in colors and CNT in scatter circles

One observable aspect from these plots is that the number of incidences does not always match with the burnt area. For instance, on June 1993, Texas (The state at the very bottom in the middle on the map) had about \(10^3\) acres burnt per \(10^4km^2\), but as shown on the Figure 4, the number of incidences was not as huge compared to the burnt area seeing the number and the size of red dots in Texas.

Plot in 06/1993, BA in colors and CNT in scatter circles

Figure 4: Plot in 06/1993, BA in colors and CNT in scatter circles

Figure 5 displays the mean out of number of cases of all the years at a specific month. One noticeable aspect is that the states at the edge of the country are most likely to have a high number of incidences. We have a phenomenon similar to the “eye of the storm” at the center, and moving around. At the beginning of the spring season, states at the east side are affected the most: but as the time goes by, the “eye of the storm” moves to the east, and finishes by having states at the west being the most affected.

Animated map of mean of number of wildfires from 1993 to 2015 at each month

Figure 5: Animated map of mean of number of wildfires from 1993 to 2015 at each month

Figure 6 is the mean out of burnt area of all the years at a specific month. Compared to Figure 5, we can recover more or less the same phenomenon with the change being more dramatic. For example, the numbers at Nevada grows from about \(6.88\) acres per \(10^4km^2\) in March to \(5.80.10^3\) acres per \(10^4km^2\) in August, compared to the number of cases that grows from \(3.02\) to \(67.62\) at the same period. By the end of summer, the impact of wildfires are huge at the west side of the country. In August, almost half of the land has more than \(2^8\) acres burnt per \(10^4km^2\).

Animated map of mean of burnt area from 1993 to 2015 at each month

Figure 6: Animated map of mean of burnt area from 1993 to 2015 at each month

Figure 7 is the mean out of number of cases from March to September at a specific year. One can deduce that the evolution is heterogeneous throughout the years, but excluding year 2015, the “eye of storm” phenomenon is apparent, with Kansas and the states around it being almost untouched, and the states at the edge being the most affected.

Animated map of mean of number of wildfires at each year

Figure 7: Animated map of mean of number of wildfires at each year

Figure 8 is the mean out of burnt area from March to September at a specific year. First we can recover more or less the same result as the one from Figure 7, except the fact that the eye moved from Kansas to Illinois. It seems like the severity of wildfires is almost periodic: at year 1997, 2004, 2010, the cases are less severe, but in 2000, 2007, 2012, the cases are way more severe.

Animated map of mean of burnt area at each year

Figure 8: Animated map of mean of burnt area at each year

We can recover the observation of Figure 8, by plotting the total burnt area (\(\sum\limits_{i=state\_0}^{all\;the\;states}BA_i\)) with respect to the year. Figure 9 shows the evolution of the total burnt area in the U.S. from 1993 to 2015. We can observe that the numbers are indeed periodic. In addition, the graph is slightly increasing considering the plot of the rolling mean of window 4, which implies that the incidences are becoming more and more severe by average. Figure 10 shows the evolution of total number of wildfires (\(\sum\limits_{i=state\_0}^{all\;the\;states}CNT_i\)). Observing the rolling mean of the same condition as the total burnt area, the numbers seem to oscillate but do not have a particular increasing or decreasing trend over time.

Total burnt area with respect to time

Figure 9: Total burnt area with respect to time

Total number of wildfires with respect to time

Figure 10: Total number of wildfires with respect to time

Visualisation of Wildfires considering different meteorological factors and their relations

In order to visualize the meteorological factors and compare with the occurrences of wildfires, an interactive plot was made (link). The plot consists of the different coordinates of the United States, colored in a continuous scale based on the values of the chosen meteorological factor. In this app you will able to select the year, month, the distribution of the meteorological factors (Wind direction, Dew Temperatures, Temperatures, Potential Evaporation, Evaporation, Solar Radiation, Thermal, Radiation, Pressure and Precipitation), the feature to display on top of the distribution (None, CNT, BA) which are represented in red circles. The radii of the circles are taken to be linear to the number of wildfires, but for burnt area we are taking \(log_2(BA+1)\) for a better visualization. Below are the descriptive analysis of each factors with respect to occurrences of wildfire, and the case of May 2012 was taken as an example. We took one plot without information about wildfire, one with the number of wildfires and one with the burnt area to compare with the factors.

Temperature

Figure 11 is displayed a gif containing three different graphs. First the distribution of the Temperatures across the country in May 2012. Then, the same plot is considered but this time the number of fires is added. Lastly, the same plot as prior but this time the burnt area is added on top.

We can see from the plot that the temperatures are higher reaching 28.15 degrees celsius in the south of the country, near the border of Mexico. Whereas the coldest state is Montana with an average of 1.37 degrees celsius during the month of May.

Looking at the number of fires, considering a preliminary visual analysis and as expected, the fires occur in the hottest regions. The same way, the largest burnt area cluster in the hottest areas. More rigorous analysis looking through this will follow back.

Gif of Temperature repartition over the State and how it visually interacts with the fires and burnt area.

Figure 11: Gif of Temperature repartition over the State and how it visually interacts with the fires and burnt area.

Pressure

Gif of Pressure repartition over the State and how it visually interacts with the fires and burnt area

Figure 12: Gif of Pressure repartition over the State and how it visually interacts with the fires and burnt area

The same way as in Figure 11, Figure 12 represents the pressure distribution and how it interacts with different attributes. One first strong observation when comparing Figures 11 and 12 is that the distribution of both meteorological factors is extremely similar. Indeed, areas with lower temperatures coicide with lower pressure and same applied in the converse situation. This is follows from the Ideal Gas Law which states that: \[P V = n R T \] Where P, V and T are the pressure, volume and temperature; n is the amount of substance; and R is the ideal gas constant. Using this, we can see why the pressure and temperature seem perfectly correlated visually.

Concerning the number of fires and burnt areas, the same conclusion as for the temperature can be applied.

Dewpoint Temperature

Gif of Dewpoint Temperature repartition over the State and how it visually interacts with the fires and burnt area.

Figure 13: Gif of Dewpoint Temperature repartition over the State and how it visually interacts with the fires and burnt area.

Figure 13 is a group of plots of the distribution of dewpoint temperature (temperature at 2m from ground to which air must be cooled to become saturated with water vapor) across the United States with the same condition as Figure 11.

The dewpoint temperature is in relation with the humidity of the place: a coordinate with a high dewpoint temperature would be more humid. On the map, we can observe that the regions in the south-east tend to have a high dewpoint temperature, meaning they are more humid than other regions. However, we can see that wildfires occur in highly humid regions as well, but we can verify that the burnt areas are bigger in size overall in the western dry regions.

Wind

Gif of Wind repartition over the State and how it visually interacts with the fires and burnt area

Figure 14: Gif of Wind repartition over the State and how it visually interacts with the fires and burnt area

In Figure 14 are displayed the wind directions and their intensities using arrows that differ in direction and length. Due to the large number of data points, only one arrow out of two is displayed.

We first note that the wind comes from the south to north, and from both the west and the east towards the center of the country.

We can also see from the plot that area with stronger wind, for example in the West coast and in the far south, fires are more predominant and the burnt area are more important.

Recall that by definition, wind results from a change of pressure in an area, thus the concentration of fires echoes the visual analysis made earlier for the pressure.

Evaporation

Gif of Evaporation repartition over the State and how it visually interacts with the fires and burnt area

Figure 15: Gif of Evaporation repartition over the State and how it visually interacts with the fires and burnt area

In Figure 15 is displayed the distribution of the evaporation across the country. Notice how again this distribution is strongly matching with the distribution seen in Figure 11.

One can still notice that although the high temperatures in the West coast, the evaporation (of water) is much smaller than the one seen in Florida for example. This can come from the fact that we are singularizing one feature rather than looking at the impact of all the features available with their interaction.

Gif of Potential Evaporation repartition over the State and how it visually interacts with the fires and burnt area

Figure 16: Gif of Potential Evaporation repartition over the State and how it visually interacts with the fires and burnt area

Figure 16 displays the potential evaporation (the amount of evaporation of water that would take place if a sufficient source of water were available) distribution across the United States. The plot is very different from Figure 15 where the real amount of evaporation was measured. Although the occurrences of wildfire are spread out, we can observe that the serious ones with larger burnt areas tend to occur in a place with higher potential evaporation (except the cases in the middle).

Solar and thermal radiation

Gif of solar radiation repartition over the State and how it visually interacts with the fires and burnt area

Figure 17: Gif of solar radiation repartition over the State and how it visually interacts with the fires and burnt area

Gif of thermal radiation repartition over the State and how it visually interacts with the fires and burnt area

Figure 18: Gif of thermal radiation repartition over the State and how it visually interacts with the fires and burnt area

Figure 17 and 18 are the plots of distribution of solar radiation (net flux of shortwave radiation; mostly radiation coming from the sun) and thermal radiation (net flux of longwave radiation; mostly radiation emitted by the surface) across the United States. We can first verify that the higher the solar radiation is, the lower the thermal radiation is. Also, we can observe that the distribution of solar radiation is similar to the distribution of evaporation (Figure 15) and the one of thermal distribution is close to the one of dewpoint temperature (Figure 13).

We can draw a similar conclusion to the case of dewpoint temperature: the occurrences of wildfire do not seem to have much relation, but in the western regions the burnt area seem to follow the regions where the solar radiation is high (and where the thermal radiation is low).

Precipitation

Gif of Precipitation repartition over the State and how it visually interacts with the fires and burnt area

Figure 19: Gif of Precipitation repartition over the State and how it visually interacts with the fires and burnt area

Figure 19 displays the distribution of precipitation across the United States. We can observe that the regions in the west tend to have less rain, thus we have some big wildfire cases. However we have an exception: on the west side of Minnesota, where the precipitation was one of the highest in the country, still had a lot of severe wildfire occurrences. We can conclude that the amount of precipitation does not always tell about the amount and the size of wildfires.

Wildfires and the land cover

Correlation between Land covers and Wildfires

Correlation plot for the fires occurences and the different land covers

Figure 20: Correlation plot for the fires occurences and the different land covers

The next step is to determine if there is a significant correlation between the different land covers and the appearance of fires. In order to do so, we use a database filtered of missing data. As can be seen in Figure 20, the correlation between the fire and the land cover remains very low. Considering that a correlation is strong when its absolute value exceeds 0.8, there is no strong correlation between the features nor between the features and the apparition of the fires.

We can nonetheless note that, besides being small, the correlation between the apparation of fires and cropland rainfed herbaceous cover, mosaic cropland, shrubland, grassland, bare areas and water is negative, while the correlation with mosaic natural vegetation, tree broadleaved evergreen closed to open, tree broadleaved deciduous closed to open, tree needleleave evergreen closed to open, tree needleleaved deciduous closed to open, tree mixed, mosaic tree and shrub, sparse vegetation, tree cover flooded fresh or brakish water, shrub or herbaceous cover flooded and urban areas is positive.

Land cover over time and location

This section will address the distribution of land covers over time. To do so, the geographical coordinates of Malibu, California has been selected. This area has been heavily impacted by wildfires in recent years. “In November 2018, the wealthy coastal enclave of Malibu was engulfed by the Woolsey Fire, which spread to over 96,000 acres of land outside of LA and is now 35% contained. At least two people were pronounced dead in Malibu on Friday” (Aria Bendix (2018)). In Figure 21, we can see the exact location of the studied point on the map of the United States.

Location considered in the analysis here is Malibu, CA

Figure 21: Location considered in the analysis here is Malibu, CA

In Figure 22, we can see the distribution of the land covers with time. We can see that the proportions are relatively stable and consistent with time. The predominant cover is the \(urban area\) with 31% of the surface in 1993 this proportion only increases with time.

Distribution of land covers across time from 1993 to 2015 in Malibu, CA

Figure 22: Distribution of land covers across time from 1993 to 2015 in Malibu, CA

Although in this particular example of Malibu, the proportion of different land covers seems consistent, this is not always the case. Taking the coordinates in Figure 23 in Wisconsin, it can seen that the proportions change drastically across the years. The percentage of land representing \(tree broadleaved evergreen closed to open\) increases with time until it becomes the predominant one, while \(shrub or herbaceous cover flooded\) decreases drastically with time as seen in Figure 24. We will not analyze the reasons and parameters that may have motivated, but will try to quantify it in the next section.

Location considered in the analysis here Wisconsin

Figure 23: Location considered in the analysis here Wisconsin

Distribution of land cover across time in Wisconsin

Figure 24: Distribution of land cover across time in Wisconsin

To access the interactive part, click this link. In this app you will able to select the longitude and lattitude and see the distribution of the land covers.

Predominant Land covers Analysis and Shifts

Here will be represented the territory according to the predominant type of coverage. Figure (25) shows this arrangement in 2000; \(water\) dominates the maritime borders, \(tree broadleaved\) are predominant in the east of the country while the west is dominated by tree needleleave. The big cities of New York, Los angeles and other large cities are easily identifiable by their predominance of urban space.

Predominant land covers in 2000 by location

Figure 25: Predominant land covers in 2000 by location

To access the intarctive part, click this link. In this app you will able to select the year and the predominant land cover will display directly on the us map.

The occurence of fires in a location with its predominant land cover can also be an interesting analysis.

Mean number of fires in the United States of America in 2000 by location

Figure 26: Mean number of fires in the United States of America in 2000 by location

In Figure (26), the highest occurency for fires happen in Florida and Georgia in the South East, in New Jersey in the East coast and all along the West coast. These areas are characterised by the predominance of \(bradleaved evergreen closed to open\) trees. This observation can be explained by the effect of climbing fire – the easy inflammable land covers e.g. \(grassland and shrubs\) act as a ladder to higher land cover with higher fuel capacity as trees. Usually, “the forests are more prone to the fire only when there is a particularly low near-surface SM (soil moisture), most likely from moderate to extensive drought” (Schaefer, Alexander J. and Magi, Brian I. (2019)).

Also, it can be observed that the closeness of urban land cover is correlated with occurrence of wildfires (For instance near New York city and Los Angeles), meaning that the human activity can be the trigger of wildfires. Other commonly known trigger of wildfires is the presence of dry thunderstorms with high activity of cloud-to-ground flashes. This mentioned trigger is usually the cause of the higher frequency of big wildfires in the Pacific Coast.

The combination of \(grassland\) with \(needle-leaved trees\) might be dangerous as can be visible on the Figure (25) and Figure (26).

The presence of \(needle-leaved trees\) for wildfires is crucial, since the trees are closer to each-other and the dead branches on the floor with sap provide enough fuel to higher spread of wildfire. Moreover, the land cover of \(deciduous trees\) can be especially dangerous in the early spring. The article by Barros et al. (Barros, Ana M. G. and Pereira, José M. C. (2014)) studying the fire selectivity in Portugal suggests that the selectivity for wildfire counts is higher for \(shrublands, grasslands and conifers\), but lower for agricultural areas such as \(cropland\). This suggestion seems to be confirmed in the US by the early observations of the Figures (25) and (26).

Number of shifts of predominant land cover by location

Figure 27: Number of shifts of predominant land cover by location

To analyze the variation, we create a vector whose values start at zero for any given location and increases by one as soon as the predominant coverage area changes from one month to the next. What is first interesting to note is that all locations change their predominant land cover at least once. Indeed in Figure (27), the shift values start at 1 and they also do not exceed 5. Considering that this measurement is taken over 22 years, this is still relatively low. Note that we will not further research on the reason behind these changes during this project.

Land covers and meteorogical correlation

We want to understand the correlation between land covers and meteorological variables. To do so, we are going to clean our data first, rename the variables by their description so that we can easily analyze our results and then compute the correlation between the variables. The problem is that when trying to output the matrix, we get a warning saying that the matrix is too big. To get an glimpse of what variable can be highly related, we are going to output the correlation between a pair of land cover and meteorological variable when it is bigger than a threshold manually set.

[1] "cor(11,7)=-0.722896324256492"

Here, we choose a threshold of 0.6 and none of the correlations has been displayed except one. This means that those who have not been printed are less that 0.6. We get that the correlation between the shrublands and the surface net thermal radiation is of -0.72. This seems normal: the more radiation is issued by the surface, the less shrub we get. But the results are not as good as expected. In fact, when printing all the correlations (threshold = 0) we remark that the values are very small. This can come from the fact that there are no correlations between the variables or that the dependence is not linear here. The first option seems less likely. So we are going to compute the correlation with spearman method to see if the relation between the variables is monotonic.

[1] "cor(11,7)=-0.626147167673006"
[1] "cor(16,8)=0.691392764491666"
[1] "cor(18,8)=0.6009781819297"

Now by changing the method, we observe that we get that the land cover variable 11 (shrubland) and the surface net thermal radiation are correlated, same result as earlier. And, we also get that the water, urban and surface pressure are correlated. The pressure in the water is higher that the atmospheric pressure and increases with the depth. The pollution of the air explains the positive correlation between urban lands and surface pressure. To visualize our results and try to show more, we are going to regroup our data so that we can compute a correlation matrix with again spearman method. This part consists of taking variables of land covers that are similar with common characteristics ans sum them together.

Correlation heatmap after regrouping the land covers with meteorological variables

Figure 28: Correlation heatmap after regrouping the land covers with meteorological variables

This technique has been useful to represent our previous result. In Figure 28, evaporation is negatively correlated with the trees, meaning the more trees there are, less evaporation there is. For the next project, one can think of trying to find correlation between other land cover and meteorological data by plotting the scatter plot to find a curve that approximate the data to look for non monotonic or non linear correlation.

Spreading fire based on the type of land cover

First, to analyze the spread of a fire, let us look for the distribution of the number of fires in a grid during a month.

Histogram representing the number of wild fires in a grid from 1993 to 2015

Figure 29: Histogram representing the number of wild fires in a grid from 1993 to 2015

We remark that most of the values are close to 0 and because of the outliers we cannot see the distribution very clearly. We thus filter the outliers to have a better view of the distrubution.

Histogram representing the number of wild fires in a grid from 1993 to 2015 (without outliers)

Figure 30: Histogram representing the number of wild fires in a grid from 1993 to 2015 (without outliers)

Now, since the aim of this part is to predict the spread of a fire given the land covers data. Let us plot the distribution of the number of fire according to each land cover variable.

Distribution of the number of fire according to each land cover

Figure 31: Distribution of the number of fire according to each land cover

Most of the plots have a distribution that can make this of Poisson distribution, or power laws in certain case.

In this section, we try to model the fact that there was a fire in a grid for a given month given the land cover specification of the grid. The problem is that we have the number of fire in the grid. One can think of using the zero inflated Poisson regression. In fact, the zero inflated Poisson is used to count data with excess zeros and overdispersion, which describe well our data. It combines the Poisson distribution and the logit distribution.


Call:
zeroinfl(formula = CNT ~ lc1 + lc2 + lc3 + lc4 + lc5 + lc6 + 
    lc7 + lc8 + lc9 + lc10 + lc11 + lc12 + lc13 + lc14 + lc15 + 
    lc16 + lc17 + lc18, data = df)

Pearson residuals:
     Min       1Q   Median       3Q      Max 
 -2.6002  -0.6556  -0.4760  -0.1269 164.8916 

Count model coefficients (poisson with log link):
            Estimate Std. Error z value Pr(>|z|)    
(Intercept)  -0.4275     0.1150  -3.718 0.000201 ***
lc1           3.8936     0.1317  29.561  < 2e-16 ***
lc2           1.7546     0.1152  15.234  < 2e-16 ***
lc3           0.1621     0.1427   1.136 0.255880    
lc4           2.5901     0.1170  22.144  < 2e-16 ***
lc5           2.3983     0.1148  20.897  < 2e-16 ***
lc6          -7.7769     0.4706 -16.526  < 2e-16 ***
lc7           2.6995     0.1149  23.490  < 2e-16 ***
lc8           2.3755     0.1158  20.514  < 2e-16 ***
lc9           1.0370     0.1172   8.845  < 2e-16 ***
lc10          4.5284     0.1232  36.762  < 2e-16 ***
lc11          1.5169     0.1153  13.158  < 2e-16 ***
lc12          1.9933     0.1155  17.256  < 2e-16 ***
lc13         -1.5373     0.1658  -9.271  < 2e-16 ***
lc14          2.7590     0.1212  22.764  < 2e-16 ***
lc15          2.5966     0.1282  20.260  < 2e-16 ***
lc16          5.8753     0.1176  49.973  < 2e-16 ***
lc17         -0.3846     0.1310  -2.937 0.003314 ** 
lc18          1.4079     0.1162  12.121  < 2e-16 ***

Zero-inflation model coefficients (binomial with logit link):
            Estimate Std. Error z value Pr(>|z|)    
(Intercept)  -9.8587     0.4163  -23.68  < 2e-16 ***
lc1           9.8726     0.4567   21.62  < 2e-16 ***
lc2          11.9956     0.4168   28.78  < 2e-16 ***
lc3          12.7570     0.4947   25.79  < 2e-16 ***
lc4           9.9462     0.4243   23.44  < 2e-16 ***
lc5          10.0305     0.4162   24.10  < 2e-16 ***
lc6          18.0285     1.0028   17.98  < 2e-16 ***
lc7           8.0604     0.4164   19.36  < 2e-16 ***
lc8           9.4954     0.4201   22.60  < 2e-16 ***
lc9          11.0770     0.4201   26.37  < 2e-16 ***
lc10          1.2508     0.4685    2.67  0.00759 ** 
lc11         10.7596     0.4172   25.79  < 2e-16 ***
lc12         10.8828     0.4172   26.08  < 2e-16 ***
lc13         16.6698     0.5593   29.81  < 2e-16 ***
lc14          8.6888     0.4388   19.80  < 2e-16 ***
lc15          9.2563     0.4608   20.09  < 2e-16 ***
lc16          7.3972     0.4386   16.87  < 2e-16 ***
lc17         12.1465     0.4312   28.17  < 2e-16 ***
lc18         11.4647     0.4180   27.43  < 2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 

Number of iterations in BFGS optimization: 43 
Log-likelihood: -1.138e+06 on 38 Df

Below, you can find a block of output containing Poisson regression coefficients for each of the variables along with the standard errors, z-score and p-values for the coefficients. A second block follows with the inflation model which includes logit coefficients for predicting excess zeros. All of the predictions in both the count and inflation portion are statistically significant ( all p-values are very small) except for the land cover 3 in the count model. But otherwise this means that the null hypothesis that the coefficient is equal to 0 is rejected for all the coefficients. Hence this model fits the data significantly better than the null model.

Prediction of the number of fire depending of the land covers:

Thanks to the previous results, the value of CNT will be predicted. First, let’s check that the model that we construct fits the data. The principle of Cross validation will be used: take 70% of the dataset to train the model and the remaining 30% will be used to test the model.

         R2     RMSE     MAE
1 0.1707637 5.906671 2.51512

To measure the good fitting of our model, we compute the R squared, the mean squared error and the mean absolute error. Here the R squared is of order 0.15, this means that the model explains 15% of the variable CNT. The lower the rmse is, the better the model is able to fit a dataset. In this case the value is 6 but since our test set is of order 160 000 points, this means that the value is low for the test set. Finally, the MAE is 2.48, on average the forecast’s distance from the true value is 2.48.

To improve the model, we tried to use feature selection has been in r ( backward selection). It takes about 20 minutes to run, the code is available in the file (predictCNTandBA.R). It only delete clim4 as feature for the model. We are going to let the model as previously. And now, we predict the missing values in our dataframe and replace the NA values by the prediction.

Plot of Predicted vs. Observed Values of CNT

Figure 32: Plot of Predicted vs. Observed Values of CNT

Prediction of the aggregated burnt areas

After trying different model, we obtain a good result when using the Generalized linear model with gaussian family:

         R2     RMSE      MAE
1 0.3978969 3477.453 342.4689

As you can, the model explains 40% of the variable. But, the value of RMSE and MAE are still big. For large data, this can occur. Let’s plot the distribution of the prediction compared to the true values for the test.

Plot of Predicted vs. Observed Values of CNT for BA

Figure 33: Plot of Predicted vs. Observed Values of CNT for BA

Now we replace the NA values in the dataframe by the predicted one.

Wildfires and meteorogical factors

In this section, we try to find some meteorological factors that could potentially trigger a wildfire. To do so, we will plot a Pearson correlation heatmap. The first step is to get a subset from our initial dataset that is more representative of wildfire conditions. As a fire is quite a rare event, we will not be able to find any links if we were considering the entire population (the correlations would be around 0).

To get our subset, we will only consider areas per month which have at least a certain number of wildfires and which have been burnt above a certain threshold. For our example, we take rows with more than 60 wildfires (\(CNT \geq 60\)) and 5000 acres of aggregated burnt area (\(BA \geq 5000\), approximately 20km2) in the current month.

We check whether the period in years of the subset matches that of the initial dataset, i.e. 1993 to 2015. We see that it does.

range(selection$year)
[1] 1993 2015

As a result, a correlation heatmap will allow us to distinguish certain factors that could favor a wildfire (Figure 34).

Correlation heatmap after subsetting (explained above) with meteorological variables

Figure 34: Correlation heatmap after subsetting (explained above) with meteorological variables

Since we are looking for meteorological risk factors, we may only consider the first two rows or columns of the heatmap.

First, we see that there are no pairs of fully correlated variables. Indeed, the highest value (absolute) is 0.43. The number of wildfires and months are negatively correlated. This can be explained by the fact that months range from March to September and that there are more occurrences of fire in March and April than in August and September. The fact that the numbers are small shows that the causes of a wildfire are meteorologically multifactorial.

An interesting observation is that \(CNT\) and \(BA\) are negatively correlated, meaning that the more area burnt, the fewer wildfires there are. Most of the time, the signs of correlations in the first two rows are not the same for these two variables. For example, by looking at temperatures and solar radiation, the higher they are, the more area burnt, but the less wildfires. This can be explained by the fact that our subset contains substantial wildfires, so there are not many, but they are destructive.

Another variable that we can comment on is precipitation. For an area to be burnt, there must be no rain or high humidity conditions, hence the negative correlations between \(BA\) and precipitation. The positive correlation with \(CNT\) could be explained by unstable weather conditions. As with wind speed, the higher it is, the more unstable the air masses and the greater the risk of a natural disaster.

But we should be careful with these correlations as we used a subset and as their values are not that high in absolute value.

Next, a map of the US represents the areas considered in the subset, regardless of months and years (Figure 35).

US map representing the areas considered in the subset (explained above), regardless of months and years. This means that if a particular area appears at least at one time in the subset, it will be shown by a red square on the map.

Figure 35: US map representing the areas considered in the subset (explained above), regardless of months and years. This means that if a particular area appears at least at one time in the subset, it will be shown by a red square on the map.

It can be seen that the areas represented are not that many in number and are located where the forest cover is the most important. For example, there are almost no red squares in the central region of the United States, as this area is mostly non-forest land.

For other thresholds, an interactive app is available here. Instructions and explanations are displayed on the app. Basically, you have the same results as above, but thresholds for \(CNT\) and \(BA\) are entered by the user.

Effect of wind on the spread of wildfires

In this section, we want to model the effect of wind on the spread of wildfires for a given month and year. We define a value of correlation between wind and aggregate fire direction for each square areas in the US map. What is called “aggregate fire direction” is defined by computing the sum (resultant) of at most 8 vectors if the corresponding area has 8 neighbours, and normalizing it to a unit vector. Vectors start from the central area and point in the direction of the neighbour in question. The intensity of the vector corresponds to the value of \(CNT\) (number of wildfires) or \(BA\) (aggregated burnt area of wildfires in acres) of the neighbour according to the selected criterion. Wind vector is simply the wind direction of the central area and is normalized. Correlation is then given by calculating the dot product of these vectors. Since they are normalized, the result ranges from -1 to 1. -1 means that their direction is opposite (negatively correlated), 0 that they are perpendicular (no correlation) and 1 that their direction is the same (positively correlated). Figure 36 illustrates the above explanations. More about quantifying the correlation between two vectors can be found here.

Example for the calculation of the correlation with 8 neighbours. Colors represent the magnitude of the value of CNT or BA (red being the highest and yellow the lowest). The norm of the vectors is adjusted according to the latter. $\vec{R}$ is the resultant of the vectors and corresponds to the aggregate fire direction. $\vec{W}$ is the vector for wind. The dot product requires a normalization of the vectors to obtain a correlation between both directions.

Figure 36: Example for the calculation of the correlation with 8 neighbours. Colors represent the magnitude of the value of CNT or BA (red being the highest and yellow the lowest). The norm of the vectors is adjusted according to the latter. \(\vec{R}\) is the resultant of the vectors and corresponds to the aggregate fire direction. \(\vec{W}\) is the vector for wind. The dot product requires a normalization of the vectors to obtain a correlation between both directions.

Let us now see how these correlations are represented in a US map. We plot them for the month of July of the year 2015 (see Figure 37) and we choose to consider fire with variable \(BA\).

US map representing correlations between wind and fire aggregate direction (calculated according to the neighbours by considering fire with variable $BA$) in July of year 2015

Figure 37: US map representing correlations between wind and fire aggregate direction (calculated according to the neighbours by considering fire with variable \(BA\)) in July of year 2015

At first glance, we notice that there are more cases of positive and negative correlation than cases of non-correlation. Green squares are often surrounded by other green squares, forming areas where the wind would move the fire and increase its strength. The same goes for the red squares, which form areas where the wind would also move the fire, but decrease its strength. There are fewer areas of non-correlation. To get a better idea of the distribution of these correlation values, they should be represented on a histogram. Figure 38 represents the distribution of correlations between wind and fire aggregate direction.

Histogram representing the distribution of correlations between wind and fire aggregate direction (calculated according to the neighbours by considering fire with variable $BA$) with median (red dashed line) in July of year 2015

Figure 38: Histogram representing the distribution of correlations between wind and fire aggregate direction (calculated according to the neighbours by considering fire with variable \(BA\)) with median (red dashed line) in July of year 2015

The histogram reflects what we observed on the map. Correlations are distributed almost evenly around 0, then there is a surplus on each side. This means that there is actually a relation between the spread of fire and wind. In the first case, the fire propagates in the direction of the wind, which can be interpreted by the fact that the wind displaces the flames with greater intensity. This happens in particular when its speed is high. As the following article explains (Bonsor 2001), “winds supply the fire with additional oxygen, further dry potential fuel and push the fire across the land at a faster rate”. In the second case, the fire goes against the wind, which can be explained by many reasons. According to the article, wildfires can generate winds called “fire whirls”. They result from vortices created by the heat of fire. Another reason explaining this negative correlation is related to topography. A fire that is trying to move up a slope can be stopped by air flows going down. For further explanations on the phenomenon, we invite you to refer to the above-mentioned article.

Note that there are some shortcomings in this way of modelling this phenomenon. For example, we are not sure that there is really a propagation of fire when the neighbours are also affected, especially as the surface of a square is rather large (approximately 2000km2). In addition, we work with monthly averages. The wind, for instance, continually changes in direction and in intensity. This has necessarily repercussions on the accuracy of the results.

A web application is available here to view these results for other months and years. There is also the possibility of taking into account the fire according to the number of wildfires in the month for a given area (\(CNT\)). We notice the same general trend for each month, including when the criterion is changed to \(CNT\).

Subgroup analysis

Quantile regression

In the preceding analysis, we have seen that many of the fires arise in a small subset of geographical gridpoints. In other words, there is considerable heterogeneity between the geographical gridpoints. Furthermore, in the interactive app which explored meteorological correlations, it was found that meteorological variables were more strongly correlated with the number of wildfires in subgroups of geographical gridpoints with a large number of fires. This motivated us to conduct an explicit subgroup analysis by employing quantile regression. We employed the linear quantile regression model \[ Q_Y(\tau\mid X) = a_0(\tau) + b_0(\tau)X \] where \(Y\) is the outcome (number of wildfires (\(N\)) or the burnt area (\(A\))), \(X\) is the exposure covariate (risk factor) and \(Q_\cdot(\tau\mid X)\) is the conditional quantile corresponding to percentile \(\tau\) for exposure level \(X\). This model allows us to explore dynamic covariate effects of \(X\) across strata of \(\tau\). In Figure 39 and Figure 40, we have fit the quantile regression model with \(X=\) temperature for a random sample of 10 000 geographical gridpoints from the year 2015.

Quantile regression for the effect on temperature on number of wildfires

Figure 39: Quantile regression for the effect on temperature on number of wildfires

Quantile regression for the effect on temperature on burnt area

Figure 40: Quantile regression for the effect on temperature on burnt area

Whereas the association between temperature and wildfire outcomes (i.e. number of wildfires and burnt area) is not very strong in the marginal analyses conducted earlier (unconditional on \(\tau\)), we observe a very clear association within strata given by percentiles of number of wildfires or burnt area. This indicates that temperature has a heterogeneous effect on wildfires: it is evident that temperature exacerbates the probability of wildfires in areas that are prone to these. As a control, we have repeated the analysis with the exposure \(X\) being the proportion \(W\) of land cover constituted by water in Figure (41) and Figure (42).

Quantile regression for the effect on water land cover on number of wildfires

Figure 41: Quantile regression for the effect on water land cover on number of wildfires

Quantile regression for the effect on water land cover on burnt area

Figure 42: Quantile regression for the effect on water land cover on burnt area

The negative control shows what we expect: areas which are susceptible to wildfires experience substantially less of these the larger the proportion of land cover that is constituted by water. These illustrations highlight the power of quantile regression in elucidating heterogenous covariate effects across units.

The strong association between temperature and wildfires for large quantiles for wildfire outcomes prompots us to consider how the yearly mean temperature has evolved with time, as this may tell us about the risk of wildfires in the future if temperatures continue to rise. In Figure (43), we observe a weakly rising trend (we have chosen vertical axis limits to start above zero to highlight variations in temperature). We therefore remark that we may experience an exacerbation of wildfires in the future, if temperatures continue to rise.

Temporal evolution of mean temperature across time

Figure 43: Temporal evolution of mean temperature across time

Survival analysis on the incidence of wildfires

Having considered the subgroup of gridpoints that are highly susceptible to wildfires, a next step was to use survival analysis to study how temperature affects the incidence of wildfires. Cox additive hazards regression was used to characterize how temperature affects the survival probability with respect to wildfire events. The survival probability is defined as the probability of not experiencing any wildfire event by month \(t\) during a given year. To fix ideas, we first considered the survival during year 2015. The Kaplan-Meier estimator of the survival probability is given by

\[ \widehat{P}(T>t) = \prod_{T_i\leq t} \left( 1- \frac{\Delta N_{T_i}}{Z(T_i)} \right)~,\]

where \(T_i\) is the time of the first wildfire in the geographical gridpoint \(i\) during the year 2015 and \(Z(t)=\sum_{i=1}^n I(T_i\geq t)\) is the number of gridpoints at risk of experiencing a first wildfire event just before time \(t\). The Kaplan-Meier estimator for the year 2015 has been plotted in Figure 44 below together with \(95\%\) confidence intervals, shown by the dotted lines. As can be seen in the plot, the first wildfire often occurs during March.

Kaplan-Meier estimator of the survival with respect to wildfire events during 2015.  Confidence intervals ($95\%$) are shown by dotted lines.

Figure 44: Kaplan-Meier estimator of the survival with respect to wildfire events during 2015. Confidence intervals (\(95\%\)) are shown by dotted lines.

Next, our analysis employed a Cox proportional hazards regression model of the type

\[ \alpha(t\mid T_C; \beta) = \alpha_0(t)\exp\left(\beta \cdot T_C\right)\] where \(\alpha_0(t)\) is a baseline hazard, \(T_C\) is the temperature (in degrees Celsius) and \(\beta\) is regression coefficent for \(T_C\), which is to be estimated. Thus, the hazard ratio for an increase by 1 degree Celsius is given by \[HR=\frac{\alpha(t\mid \theta; \beta)}{\alpha(t\mid \theta-1; \beta)}=\exp(\beta) ~.\] For the year 2015, the Cox proportional hazards regression give the following results:

Call:
coxph(formula = Surv(start, stop, event) ~ Temp, data = df.truncated.sample)

  n= 3460, number of events= 746 

         coef exp(coef) se(coef)     z Pr(>|z|)    
Temp 0.051728  1.053090 0.005771 8.963   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

     exp(coef) exp(-coef) lower .95 upper .95
Temp     1.053     0.9496     1.041     1.065

Concordance= 0.638  (se = 0.013 )
Likelihood ratio test= 77.71  on 1 df,   p=<2e-16
Wald test            = 80.34  on 1 df,   p=<2e-16
Score (logrank) test = 81.36  on 1 df,   p=<2e-16

Thus, temperature has a statistically significant effect on the hazard ratio for wildfire events.The analysis was also repeated for other years: the resulting hazard ratio estimates have been illustrated with \(95\%\) confidence intervals in Figure 45 below.

Evolution of hazard ratios over time. Confidence intervals ($95\%$) are shown by dotted lines. The red line represents the null value of the hazard ratio.

Figure 45: Evolution of hazard ratios over time. Confidence intervals (\(95\%\)) are shown by dotted lines. The red line represents the null value of the hazard ratio.

Most years, the confidence intervals do not include the null value of the hazard ratio, shown as a red line on the above plot. This may suggest that temperature not only exacerbates wildfires in high risk geographical locations, but is also associated with new occurrences of wildfires in areas that where they do not usually occur. This may give an indication of the scale of evolution of wildfires in the future, if temperatures continue to rise.

Conclusion

In this investigation, we have studied three sets of risk factors for wildfires: (1) the time period (2) the type of land cover and (3) local meteorological conditions. Before delving into the detailed analyses that would allow us to answer these questions, we conducted a preliminary and entirely visual investigation to see the interaction between the number of fires and the area burned with the different climatic and spatial variables. With the results obtained, we gained a first idea and expectations for more accurate findings that will follow. The interactive map as a function of time let us analyse the influence of the time period within a year and throughout the years. Although the map looked very heterogeneous, some particular trends have been observed for a specific time period in a year. Also, the global trend for burnt area was periodic, and slightly increasing throughout the years whereas the number of wildfires did not. In our exploration of land cover, we found that the proportion of urban areas has been increasing from 1993-2005. Fires are often caused by urban activity, so this trend could lead to more wildfires in the future. In our meteorological exploration, we found that the cause-effect relation of meteorological variables on wildfires may be highly multifactorial. The correlations between the number of fires and meterological varibles were small marginally in the population. To disentangle the effect of meterological variables from the effect of land cover on fires, we examined the joint distribution of these variables. We found that the correlation between meteorological variables and land cover types were weaker than expected, which makes it more feasible to interpret these variables as independent causes of wildfires. We build a model using both land covers and meteorological variables to describe the number of fires and the aggregate burnt areas. Thanks to these models, we predicted the missing values in our dataframe. In addition, we tried to model the effect of wind on the spread of wildfires. The results as well as the literature indeed show that wind plays a major role in the dynamics of wildfires, especially by supplying them with oxygen. The direction as well as the intensity of the fires vary according to the parameters of the current wind.

Most of our analysis has targeted marginal associations between risk factors and the number of wildfires or aggregate burnt area (i.e. unconditional on subsets of geographical gridpoints). These correlations were often weak, but grew stronger in subsets of the population with a greater number of wildfires. This motivated us to characterize the heterogeneity in risk of wildfires by conducting subgroup analyses. To do so, we performed quantile regression, which revealed that temperature substantially exacerbated the risk of wildfires in areas which are prone to wildfires. This is in spite of the fact that temperature was only weakly associated with the number of wildfire events marginally. We also performed survival analysis for the time to the first wildfire event of the year using Cox additive hazards regression, which revealed a significant association betwen temperature and the hazard of the first wildfire event.

Furthermore, we have seen a weakly rising trend in the temperature over the past years. Coupled with our observation fact that the proportion of urban land cover has been rising steadily with time, and that human activity is an important cause of wildfires, we note that we may see an increased occurrence of wildfires in the years to come.

Future improvements

An extension to our analysis would be to use our data to do some predictions. So it would also be advantageous to find a more recent dataset, since ours is limited to 2015. As a consequence, we could perceive a more important effect of global warming, and previous correlations might appear more distinctly.

Another improvement would be to look at other regions that are subject to wildfires, such as Australia or African countries. This would increase our dataset and help us capture the main factors and causes by comparing the areas (more heterogeneity in our data).

Finally, our data may contain fires that are of criminal origin. In this analysis, we have assumed that the causes are of natural origin. An improvement would be to take this into account and possibly identify those cases which do not interest us in this context.

Alejandra Borunda. 2021. The Science Connecting Wildfires to Climate Change. National Geographic. https://www.nationalgeographic.com/science/article/climate-change-increases-risk-fires-western-us.

André Gabrielli. 2019. Wildfires. National Geographic. https://www.nationalgeographic.org/encyclopedia/wildfires/.

Aria Bendix. 2018. Before-and-After Photos Show the Devastating Destruction in Malibu as the California Wildfires Rage on. Business Insider. httphttps://www.businessinsider.com/california-wildfires-photos-malibu-woolsey-fire-2018-11?r=US&IR=Ts://www.R-project.org.

Barros, Ana M. G. and Pereira, José M. C. 2014. Wildfire Selectivity for Land Cover Type: Does Size Matter? PloS One. Vol. 9. Public Library of Science.

Bonsor, Kevin. 2001. How Wildfires Work. HowStuffWorks. https://science.howstuffworks.com/nature/natural-disasters/wildfire.htm.

Holly Yan, Cheri Mossburg, Artemis Moshtaghian and Paul Vercammen. 2020. California Sets New Record for Land Torched by Wildfires as 224 People Escape by Air from a ’Hellish’ Inferno. CNN. https://edition.cnn.com/2020/09/05/us/california-mammoth-pool-reservoir-camp-fire/index.html.

Mert Ozkan and Ezgi Erkoyun. 2021. Turkish Wildfires Are Worst Ever, Erdogan Says, as Power Plant Breached. Reuters. https://www.reuters.com/world/middle-east/fire-near-turkish-power-plant-under-control-local-mayor-2021-08-04/.

Michael Slezak. 2020. 3 Billion Animals Killed or Displaced in Black Summer Bushfires, Study Estimates. ABC News. https://www.abc.net.au/news/2020-07-28/3-billion-animals-killed-displaced-in-fires-wwf-study/12497976.

Sarah Gibbens. 2021. Wildfire Smoke Blowing Across the U.s. Is More Toxic Than We Thought. National Geographic. https://www.nationalgeographic.com/environment/article/wildfire-smoke-blowing-across-country-more-toxic-than-we-thought.

Schaefer, Alexander J. and Magi, Brian I. 2019. Land-Cover Dependent Relationships Between Fire and Soil Moisture. Fire. Vol. 2. Multidisciplinary Digital Publishing Institute.

Steven Reinberg. 2021. Wildfires Cause More Than 33,000 Deaths Globally Each Year. U.S. News. https://www.usnews.com/news/health-news/articles/2021-09-09/wildfires-cause-more-than-33-000-deaths-globally-each-year.

Topher Gauk-Roger, Stella Chan, Jason Hanna and Steve Almasy. 2020. California Wildfires: Fire Chief Says Dozens of Major Blazes Have State in ’Dire Situation’. CNN. https://www.cnn.com/2020/09/08/us/california-fires-tuesday/index.html.


  1. The given dataset was too huge to be imported in python, since it is a compressed Rdata file, which exceeds 300 MB when converted to a CSV file meaning similar or bigger in size when imported to python. To deploy python applications, one can use Heroku, but the maximum given memory for computation is 512 MB, which is too small with respect to this dataset. One can try to minimize the use of memory, but it would be almost impossible since at least two times the size of the dataset would be needed as free memory in order to make different plots (as importing the file is needed first, and breaking it down to several Dataframe objects would take at least as many memory as the size of the whole dataset). Another approach would be to use R instead and deploy the application on shinyapps.io, because Rdata is believed to be computation-wise and memory-wise more efficient to be handled with R.

References